Qualitative Data Cleaning

نویسندگان

  • Xu Chu
  • Ihab F. Ilyas
چکیده

Data quality is one of the most important problems in data management, since dirty data often leads to inaccurate data analytics results and wrong business decisions. Data cleaning exercise often consist of two phases: error detection and error repairing. Error detection techniques can either be quantitative or qualitative; and error repairing is performed by applying data transformation scripts or by involving human experts, and sometimes both. In this tutorial, we discuss the main facets and directions in designing qualitative data cleaning techniques. We present a taxonomy of current qualitative error detection techniques, as well as a taxonomy of current data repairing techniques. We will also discuss proposals for tackling the challenges for cleaning “big data” in terms of scale and distribution.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Chemical cleaning of potable water membranes: a review

The literature on chemical cleaning of polymeric hollow fibre ultrafiltration and microfiltration membranes used in the filtration of water for municipal water supply is reviewed. The review considers the chemical cleaning mechanism, and the perceived link between this and membrane fouling by natural organic matter (NOM) – the principal foulant in municipal potable water applications. Existing ...

متن کامل

QUALITATIVE INTERVIEWING IN INTERNET STUDIES Playing with the media, playing with the method

This methodological paper addresses practical strategies, implications, benefits and drawbacks of collecting qualitative semi-structured interview data about Internet-based research topics using four different interaction systems: face to face; telephone; email; and instant messaging. The discussion presented here is based on a review of the literature and reflection on the experiences of the a...

متن کامل

The importance of cleaning for the overall results of processing endoscopes.

Reprocessing comprises three steps: cleaning, disinfection and-if required-sterilisation. While the extents of disinfection and of sterilisation are quantitatively defined, there are only imprecise (qualitative) definitions of cleaning. There are two main reasons for accurate cleaning. First organic and inorganic materials that remain on inner and outer surfaces will interfere with the efficacy...

متن کامل

Characterization of occupational exposures to cleaning products used for common cleaning tasks-a pilot study of hospital cleaners

BACKGROUND In recent years, cleaning has been identified as an occupational risk because of an increased incidence of reported respiratory effects, such as asthma and asthma-like symptoms among cleaning workers. Due to the lack of systematic occupational hygiene analyses and workplace exposure data, it is not clear which cleaning-related exposures induce or aggravate asthma and other respirator...

متن کامل

Comparative Analysis of Different Imputation Methods to Treat Missing Values in Data Mining Environment

Data cleaning is one of the important step of KDD (Knowledge discovery in database) process. One critical problem in data cleaning is the presence of missing values. Various approaches have proposed to find & replace such missing data including use of mean value, use of global constant, replace by more probable value etc. Imputation is one of the important procedures in statistics that is used ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • PVLDB

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2016